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DETECTION OF BISTABILITY IN PHASE SPACE OF A REAL GALAXY, USING A 
NEW NON-PARAMETRIC BAYESIAN TEST OF HYPOTHESIS 

By Dalia Chakrabarty*'^,, 

£f) ' University of Warwick^ and University of Leicester^ 

f*^ , In lieu of direct detection of dark matter, estimation of the distribution 

£*\j ■ of the gravitational mass in distant galaxies is of crucial importance in As- 

, trophysics. Typically, such estimation is performed using small samples of 

O i. noisy, partially missing measurements - only some of the three components 

^* ■ of the velocity and location vectors of individual particles that live in the 

galaxy are measurable. Such limitations of the available data in turn demands 
Cn . that simplifying model assumptions be undertaken. Thus, assuming that the 

phase space of a galaxy manifests simple symmetries - such as isotropy - al- 
lows for the learning of the density of the gravitational mass in galaxies. This 
n, ■ is equivalent to assuming that the phase space pdf from which the velocity 

^A ' and location vectors of galactic particles are sampled from, is an isotropic 

function of these vectors. 

We present a new non-parametric test of hypothesis that tests for relative 
support in two or more measured data sets of disparate sizes, for the under- 
taken model assumption that a given set of galactic particle data is sampled 
from an isotropic phase space pdfs. This test is designed to work in the con- 
text of Bayesian non-parametric, multimodal inference in a high-dimensional 
^ . state space. In fact, the different models that are being compared are char- 

es acterised by differential dimensionalities of the model parameter vectors. In 

addition, there is little prior information available about the unknown param- 
eters, suggesting uninformative priors on the parameters in the different mod- 
els. The problem of model parameter vectors of distinct dimensionalities and 
the difficulties of computing Bayes factors in this context, are circumvented 
^^ ■ in this test. The test works by identifying the subspace (of the system phase 

C*) | space) that is populated by those model parameter vectors, the posterior prob- 

ability density of which exceed the maximal posterior density achieved under 
the null. The complement of the probability of the null given a data set is then 
the integral of the posterior probability density of the model parameters over 
|r\J . this sub-space. This integral is the probability that the model parameter lives 

in this subspace; in implementational terms this probability is approximated 
as the fraction of the model parameters that live in this subspace. We illustrate 
applications of this test with two independent particle data sets in simulated 
as well as in a real galactic system. 

The dynamical implications of the results of application to the real galaxy 
is indicated to be the residence of the observed particle samples in disjoint 
volumes of the galactic phase space. This result is used to suggest the serious 
risk borne in attempts at learning of gravitational mass density of galaxies, 
using particle data. 
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1. Introduction. One of the burning questions in science today is the understanding of dark mat- 
ter. The quantification of the distribution of dark matter in our Universe, at different scales, is of ma- 
jor interest in Cosmology (de Blok, Bosma and McGaugh, 2003; Hayashi, Navarro and Springel, 2007; 
Roberts and Whitehurst, 1975; Salucci and Burkert, 2000; Sofue and Rubin, 2001). At scales of indi- 
vidual galaxies, the estimation of the density of the gravitational mass of luminous as well as dark 
matter content of these systems, is the relevant version of this exercise. Readily available data on galac- 
tic images, i.e. photometric observations from galaxies, can in principle, be astronomically modelled to 
quantify the gravitational mass density of the luminous matter in the galaxy, (Bell and de Jong, 2001; 
Gallazzi and Bell, 2009); such luminous matter is however, only a minor fraction of the total that is 
responsible for the gravitational field of the galaxy since the major fraction of the galactic gravitational 
mass is contributed to by dark matter. Thus, the gravitational mass density of luminous matter, along 
with constraints on the gravitational mass density of the dark matter content, if available, can help the 
learning of the total gravitational mass density. However, the learning of this physical density is diffi- 
cult in light of the sparse and noisy measurements that are available, coercing the practitioner to resort 
to undertaking simplifying model assumptions. In this paper, we present a new way of quantifying the 
relative support to such a model assumption in two independent data sets, by comparing the posterior 
probabilities of models given the two data sets. 

Model selection is a very common exercise faced by practitioners of different disciplines, and sub- 
stantial literature exists in this field (Barbieri and Berger, 2004; Berger and Pericchi, 2001; Casella et al., 
2009a; Chipman, George and McCulloch, 2001; Ghosh and Samanta, 2001; Kass and Raftery, 1995; 
O'Hagan, 1995). In this context, some advantages of Bayesian approaches, over frequentist methods 
has been reported (Berger and Pericchi, 2004; Robert, 2001). 

Much has been discussed in the literature to deal with the computational challenge of Bayes factors 
(Casella et al., 2009b; Chib and Jeliazkov, 2001; Han and Carlin, 2000, to name a few). At the same 
time, methods have been advanced as possible resolutions when faced with the challenge of improper 
priors on the system variables (Aitkin, 1991; Berger and Pericchi, 1996a; O'Hagan, 1995). However, 
the computation of posterior, intrinsic or fractional Bayes factors persist as a challenge, especially in 
the context of discrete, non-parametric and multimodal inference on a high-dimensional state space 
(Link and Barker, 2006). 

This paper demonstrates a novel methodology for quantifying support in two (or more) indepen- 
dent data sets from the same galaxy, for the hypothesis that the system state space admits a certain 
symmetry - namely, isotropy. Assuming isotropy implies that the probability density function in this 
high-dimensional state space of the system, depends only on the magnitude of the state space vector, 
and does not depend on the inclination of this vector to a chosen direction. 

The null is nested within the alternative in this problem. At the same time, one of the data sets is small 
and the other large. Also, the dimensionalities of the model parameter vectors sought under the different 
hypotheses, are also different. Indeed in such a case, testing with multiple hypotheses can be performed, 
such that one hypothesis suggests that the state space that one of the data sets is sampled from, admits 
this symmetry while the other hypothesis suggests the same for the other data set (considering the case 
of 2 available data sets). However, in the two cases, little prior information are available on the system 
parameter vectors. The priors on the model parameters are in fact non-informative (uniform), marked by 
unknown multiplicative constants. Then, as discussed in (Berger and Pericchi, 1996b; Kass and Raftery, 
1995), this results in the ratio of the predictive densities of the data sets, under the models, being defined 
only up to the ratio of these unknown multiplicative constants. Thus, the Bayes factor is rendered arbi- 
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trary. The computation of the Bayes factors is then, in principle possible with posterior Bayes factors 
(Aitkin, 1991), intrinsic Bayes factors (Berger and Pericchi, 1996a,b) or with fractional Bayes factors 
(O'Hagan, 1995). 

However in this real-life problem that we discuss here, the implementation of Bayes factors gets 
difficult given that the relative support to an undertaken assumption in two distinct data sets is sought by 
comparing the posterior probability of models that are characterised by different numbers of parameters. 
Secondly, the more complex model in which the simpler model (the null) is nested in this situation, is 
intractable. The null or the simpler model assumes the model parameter space to bear a simple symmetry, 
thus rendering posterior computation possible. On the contrary, the alternative (or the more complex 
model) does not constrain the geometry of the model parameter space in any way. Thus, the simpler 
model is nested within the more complex model. However, under the alternative hypothesis, i.e. in lieu 
of such simplifying assumptions, it is not possible to compute the posterior probability of the model 
parameter vectors. This situation is in principle caused by the complexity of the system, compounded 
by the paucity of measurements. In the test presented here, learning can be performed - relatively easily 
under the null - and thereafter, support in the data for the assumption about the symmetry of the model 
parameter space is quantified. 

Lastly, it is acutely challenging in this high-dimensional non-parametric situation, to achieve intrin- 
sic priors (Berger and Pericchi, 1996a) with imaginary training data sets. Such training data can be 
generated from the posterior predictive distribution under the null (Perez and Berger, 2002), and is sub- 
sequently used to train the prior under the more complex model, to implement it in the computation of 
the marginal density of the real data. However, the computational difficulty involved in the training of 
the intrinsic prior even under the null, in this high-dimensional setup, is discouragingly daunting. 

The implementation of real training data is not possible either, given that any method implemented 
to learn the gravitational mass density - particularly the similarly motivated Bayesian methodology 
discussed below - might be misled by undersampled data that is likely to result if subsamples are taken 
from the already small observed data sets that typify this application. 

It was presented in the last paragraph that the generated training data cannot be of practical use in real- 
life applications of the kind we consider here. However, the implementation of real training data, for the 
purposes of achieving intrinsic priors, is not possible either. Such real training data is obtained as a sub- 
sample of the available data set. Given that for the current application, the data sets are typically small 
to begin with, estimation of the unknown model parameter vectors using a still smaller (sub)sample of 
measurements might be risky in terms of convergence of undertaken inference methods. 

The difficulties with the computation of Bayes factors that we have delineated above, motivates the 
need for a test that allows for computation of the comparative support in an available data set for one 
hypothesis to another, and is operational even with non-informative priors, without needing to invoke 
any other than the available data set, in the computation of the posterior probability distribution of 
the hypothesis given the data. The test is also motivated to work irrespective of the dimensionality 
of parameter spaces. Motivated by this framework, this paper introduces a new nonparametric test of 
hypothesis that tests for the existence of global symmetries of the phase space that the available data are 
sampled from. The test works in parameter space, in the context of non-parametric Bayesian inference 
when for the two or more cases of differently sized samples, little and/or differential prior information 
are available. 

This new test involves partitioning the space of the model parameter vectors such that one of the 
partitions - the subspace T - contains those parameters for which the posterior probability density given 
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the data, exceeds the maximal posterior density that can be achieved under the null. Outside T lie the 
parameter vectors for which the posterior probability falls below this maximal posterior under the null. 
The model parameter vectors that lie in T are those that underlie the support in the data against the null. 
Thus, the probability of the null given a data set is the complement of the posterior probability density 
integrated over the subspace T (instead of over the whole of the parameter space). Such an integral can 
also be viewed as the probability that a model parameter vector is in the subspace T. In this treatment, the 
probability of a null given the data that is computationally simple to achieve, in the context of Bayesian 
nonparametric inference in high-dimensional state spaces. 

The paper is organised as follows. Section 2 discusses the application, in the context of which the 
new test is introduced. The formulation of a phase space pdf as an isotropic scalar-valued function 
of the velocity and location vectors of a galactic particle, is discussed in Section 3.1. The details of the 
Bayesian estimation of the unknown parameters of the galaxy, is discussed in Section 3.2. The estimated 
functions of a real galaxy are presented in Section 3.3. The null hypotheses that we test are motivated in 
Section 4. In Section 4.1, we discuss the availability of priors on the unknown functions in the relevant 
literature and how this affects the testing. Thereafter, shortcomings of the Bayes factor computations in 
the context of this application are delineated in Section 4.2. In Section 5.2, the outline of the new test is 
presented; in particular, Section 5.4 is devoted to a detailed discussion about the implementation of this 
new methodology. The test is illustrated on simulated data in Section 6 and on real data in Section 7. 
The paper is concluded with a discourse on the implications of the results, in Section 8. 

2. The application at hand. As discussed above, it is difficult to learn the gravitational mass den- 
sity p(x) of dark+luminous matter in galaxies, where p(x) > 0, X G M 3 . p(x) is positive definite, im- 
plying that in any infinitesimally small volume dx inside the galaxy, gravitational mass is non-zero (and 
positive). Other physically motivated constraints on p(x) will be discussed in the next section in the con- 
text of learning this function, given the data at hand. While photometric measurements are more readily 
available, direct detection of dark matter has hitherto been impossible, implying that measurements that 
can allow for quantification of the gravitational mass density of dark matter, are not achievable. Instead, 
there are effects of the total (dark+luminous) gravitational mass that can be measured, though astro- 
nomical measurements that bear signature of such effects are hard to achieve in "early-type" galaxies, 
the observed image of which is typically elliptical in shape 1 . Of some such astronomical measurements, 
noisy and partially missing velocities of individual galactic particles have been implemented to leam 
p(x) (Chakrabarty and Raychaudhury , 2008; Cote et al., 2001; Genzel et al., 2003). From this, when 
the astronomically modelled gravitational mass density of luminous matter, pi(x), is subtracted, the 
density of the gravitational mass of the dark matter content of early-type galaxies can be learnt. In this 
paradigm of learning p(x), the data is referred to as partially missing since the noisy measurements of 
only one component, namely, the component along the line-of-sight that joins the observer to the parti- 
cle - of the three-dimensional velocity vector of galactic particles, are typically available. We view these 
measurables as sampled from the phase space W of the system, where W is the space of all the states 
that the system can achieve. For a galaxy, W is the space of the spatial vectors and velocity vectors of all 
galactic particles. On other occasions, the dispersion of the line-of-sight component of the velocity vec- 
tor comprises the measurement. In either case, such kinematic data is expected to track the gravitational 
field of the galaxy in which the sampled particles play. 



'The intrinsic global morphology of such "early-type" galaxies is approximated as a triaxial ellipsoid; in this paper we 
discuss gravitational mass density determination of such galaxies only. 
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We ask the question that if the available data include velocity measurements of particles in distinct 
samples that live in mutually disjoint volumes of W, what are the implications for the gravitational 
mass density estimated using such data sets? One major implication is that the estimate of p(x) obtained 
using one data set will be in general, different from that obtained using another data set. Of course, 
for the same system, distinct p(x) are impossible - in fact, such a fallacious result, if achieved, can be 
explained as due to the fact that the used data sets have been drawn from mutually insulated volumes of 
the galactic phase space, the pdfs of which do not concur. Such is possible, if the galactic phase space 
is characterised by disjoint volumes, motions in which do not communicate with each other. This is a 
viable scenario in non-linear dynamics even for systems with moderate complexity; it is possible that 
the two types of galactic particles, the data of which are measured, live in distinct basins of attraction 
that characterise the galactic phase space (Thompson and Stewart, 2001). 

In particular, we configure the question of unequal estimates of gravitational mass density functions, 
using the available data sets, to the context of the real galaxy NGC 3379. For this system, multiple data 
sets are measured for two distinct types of galactic particles (Bergond et al., 2006; Douglas et al., 2007). 

3. Estimating gravitational mass density. The 3-D spatial location vector of a particle resident in 
a galaxy is written as X = (Xi,X2,X^) T , where only the coordinates of the image of the particle - 
X\ and X2 - can be measured. Also, the 3-D velocity vector of a particle is V = (V%, V2, V3) 7 ', with 
only the component V3 being a measurable, i.e. we can only measure the speed with which the particle 
is approaching us (the observer) or receding from us. The galactic phase space W is the space of V and 
X of all galactic particles. Thus, W C M 6 . 

3.1. Assuming an isotropic phase space pdf. If we are convinced that motion tracks the gravitational 
field due to a given gravitational mass density function, then we can write down v = £[p(x)], where 
£(■) is a function of p(x). In Chakrabarty and Portegies Zwart (2004); D.Chakrabarty (2009), the effort 
has been to learn p(x) from the data using non-parametric modelling. The aim in these work is to make 
inference on the gravitational mass density p(x) given observations of only 1 component V3 out of 3 
components of the velocity vector of particles resident in the system, where the dependence of V3 on 
p(x) is intuitively motivated. 

In order to render the problem tractable, we need to undertake some simplifying model assumptions. It 
is useful in such situations to assume something simple about symmetries of the system state space. The 
simplest such assumption considers phase space W to be isotropic, i.e. for the phase space probability 
density function /(x, v) > 0, to be isotropic in X and V. Thus, the phase space pdf is / : W — ► 
]R>o- In general, the pdf of W has a time dependence, but here we assume the system to have attained 
stationarity, i.e. for the phase space pdf to be independent of time. Thus, one model assumption that we 
undertake is that of stationarity of the phase space pdf. Such is typically undertaken in the modelling of 
galaxies since deviation from stationarity cannot be checked in the available data since the data sets offer 
snapshots at a (nearly) unique time point in the evolution of the system, the time scale of which ~ 10 9 
years, (Binney and Tremaine, 1987). There are reports of observational signatures of non-stationarity; 
in NGC 3379 these could be considered to include the X-ray image of NGC 3379 (Pellegrini and Ciotti, 
2006) 2 . All in all, at the level of this paper, we cannot say that deviation from equilibrium is ruled out. 



2 Pellegrini and Ciotti (2006) report the gravitational mass of the galaxy modelled using X-ray measurements, as overesti- 
mated by a factor of about 2, compared to hydrodynamical modelling that assume hydrostatic equilibrium. Pellegrini and Ciotti 
(2006) invoke the imaged out-flowing nature of the hot gas from this galaxy to support deviation from hydrostatic equilibrium. 
However, the contribution of such deviations from hydrostatic equilibrium, towards the estimate of gravitational mass has not 
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Thus we state that it is under this model assumption of stationarity, that we interpret all results. The 
phase space pdf is then a scalar-valued, function of 2 vectors, X and V. 

Now, if a general real-valued function <?(•, •) of two vectors a, b e M. m , is an isotropic function, then 
g(a, b) = g(Qa, Qb), for any orthogonal transformation matrix Q (Truesdell, Noll and Antman, 2004; 
Wang, 1969). We recall from the theory of scalar valued functions of two vectors, that if g(-, •) is an 
isotropic function, its set of invariants (with respect to Q) is Tq = {a a, bb, ab}. Then, the isotropic 
function of two vectors, g(&, b), admits the representation q{Tq) (Liu, 2002; Truesdell, Noll and Antman, 
2004). In general, Tq is the sum of a function of a • a with a function of b • b and a function of a • b. If 
a and b are orthogonal vectors, a • b=0. 

Then, it follows that 

- under the assumption that the phase space pdf is an isotropic scalar valued function of 2 vectors, 
x and v, it admits the representation /(e), 

- if we can write a general phase space pdf as /(e), it implies that it is an isotropic scalar valued 
function of x and v, 

where e is a sum of a function of x • x with a function of v • v. (The inner product X • V=0),i.e. 

(3.1) e = $(v / x _ ^) + ??(v-v), 

with <£(•) and ry(-) identified as some real-valued functions. 

One physical interpretation of the above representation is that the phase space pdf is isotropic if and 
only if it can be written as a function of the value e of the particle energy, where energy of a particle 
with location X = x and velocity V = v is the function E(x 2 ,v 2 ), i.e. 

(3.2) e = E{x 2 ,v 2 ). 

N.B. the symbol E is to be distinguished from expectation. Now total energy is the sum of potential and 
kinetic energies. Thus, to allow for the physical interpretation of e as the value of the particle energy, we 
choose <I>(- v /x • x) to be the gravitational potential energy, (written as a function of x • x = x\ + x\ + x|) 
and the kinetic energy ??(v • v) = v • v/2 3 . It merits mention that p(x) is a known function of ^(x) as 
given by a standard equation of Physics - the Poisson equation: 

(3.3) V 2 $(x) = -4vrGp(x), 

where V 2 is the Laplacian operator and G is a known constant. 

However, in lieu of information about the nature of the phase space density, and given at least the 
moderate level of complexity that is envisaged to manifest in phase spaces of galaxies, there is no 
justification for opting for an isotropic, over an anisotropic prescription for galactic phase spaces. There 
is a strong logistical motivation for doing so however - the assumption of an isotropic /(x, v) renders the 



been satisfyingly addressed. 

3 The above identification of the dependence of / on particle energy is not motivated merely to invoke familiar ideas of 
physics into the model but is reinforced by the fact that phase space pdf depends on X and V only via constants of motion 

since — = 0, i.e. /(x, v) obeys the Collisionless Boltzmann Equation(Binney and Tremaine, 1987). The only constant of 

motion / that depends on X and V - as in I(x ■ x, v ■ v) - is the particle energy. Thus, an isotropic phase space pdf depends 
on e = E(x 2 , v 2 ) or any regular function of e. Without loss of generality, for our purpose, we suggest that the isotropic phase 
space pdf is /(e). 
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calculations involved in the learning of p(x) relatively easier. The details of this learning, given the data 
and the assumed prescription of isotropy for /(x, v), are discussed in Section 3.2. In principle, learning 
p(x) after relaxing the assumption of isotropy, under a particular restrictive prescription for including 
anisotropy, can still be possible with a much harder computational procedure (Chakrabarty and Saha, 
under preparation). However, it is impossible to learn /?(x) in the general anisotropic cases, given the 
typically partially missing nature of the kinematic data, under an unrestricted model for the anisotropy 
of phase space. Thus, the estimate of p(x) is sensitive to the prescription chosen for the phase space 
symmetiy - isotropic or not, and if anisotropic, to the value of the "anisotropy parameter". This problem 
refers to the lack of identifi ability among solutions for p(x), caused by lack of knowledge about the 
state of isotropy of galactic phase spaces, and is referred to as the mass-anisotropy degeneracy in the 
astrophysical literature (Cote et al., 2001, 2003; Koopmans, 2006; Lokas and Mamon, 2003). 

The problem has been a pressing concern of Dekel et al. (2005) when they advance the possibility 
that anisotropy might be introduced into /(x, v) of galaxy NGC 3379 due to evolutionary reasons, 
which Romanowsky et al. (2003) had ignored. Douglas et al. (2007) disagree that this could explain 
the identification of the low dark matter content in NGC 3379 by Romanowsky et al. (2003) given 
that the existence of anisotropy was included in the data analysis employed in the earlier work. They 
stress that "anisotropy parameter" recovered from their kinematic data and that obtained from the sim- 
ulations presented in Dekel et al. (2005), are similar in nature. 4 . Here, the anisotropy parameter is a 
parametrisation of the deviation from an isotropic phase space density, defined as 1 — (j^(x)/(t|(x) 
(Binney and Tremaine, 1987), where <r^(x) is the variance in the k-th component of V, k = 1, 2. 

However, the suggestion advanced in this paper is more general. Firstly, the learning of p(x) is non- 
parametric, so an explicit parametrisation of anisotropy is not of relevance. Importantly, if the test of 
hypothesis that we develop here, suggests less support in one data set over another, to the model as- 
sumption of an isotropic phase space pdf, it would imply that the two data sets have been sampled from 
distinct phase space density functions that will then be inteipreted as fundamentally different. Such 
unequal pdfs cannot in general be reconciled with each other, merely by adjusting one parameter, the 
physical interpretation of which is the anisotropy parameter as given in astrophysical literature. 

3.2. Non-parametric Bayesian learning ofp(x). In general, the phase space pdf will contain param- 
eters a, i.e. the phase space pdf is /(x, v; a). Given data D of the observable phase space coordinates 
of a sample of N tot particles, the likelihood function £(a|D) can be written using the phase space pdf 
/(x, v; a) since by definition, dPr(D( fc )|a:) = /(x( fc ), v( fc ); a)d 3 xd 3 v, where the data vector of the 
k-th particle is D^ = (xS k \ v^) T and the spatial and velocity vectors of this particle are x.( k > and 
v( fe ) respectively, k = 1,2, . . . , Ntot- However, there are two points of worry at this stage. 

Firstly, we want to learn p(x) but it would appear that the likelihood proposed above does not in- 
clude this function in its definition. Generally speaking, in such situations, we would need to em- 
bed the sought unknown function (p(x)) in the definition of the phase space pdf by suggesting that 
/(x, v; a) = f(^(p(x.), v, ai),x, v, 02), where a := (ocj , Q^) T . This calls for identification of a 
function ^(p(-), •, •) using domain knowledge. The likelihood would then in general be written as 

Ntot 

(3.4) £(a|D«, . . . ,D^ tot )) <x ] J /(*(p(x^), ^ k \ai),x.^M k \a 2 ) 

fc=i 



4 Weijmans and et al. (2009) cannot infer the distribution of the total gravitational mass distribution in this galaxy us- 
ing a parametric methodology since they cannot obtain information on certain model parameters. Coccato et al. (2009) and 
Pierce and et al. (2006) report the characterisation of this galaxy using available kinematic data. 
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where the data are considered to be lid. 

In our case, we have already identified that the admissible representation of an isotropic phase space 
pdf, is as a function of particle energy, (Section 3.1), which is itself the sum of potential and kinetic 
energies. We have defined e := E(x. • x, v • v) = ^(-^x • x) + v • v/2, (see Equations 3. 1 , Equation 3.2), 
where 3>(x) depends on the sought unknown function p(x) via Poisson equation (see Equation 3.3). 
Thus, a function that has p(\/x • x) embedded in it, is the energy function E(pc • x, v • v) that we now 
re-write as E(p(y/x • x), v • v), so that 

(3.5) e = E(x\v 2 ) = £(x • x, vv) = £(p(v / x _ ^), v v) = E(p(\\ x ||), || v || 2 ) 

where || • || denotes the Euclidian norm of a vector. To summarise, the dependence of /(•) on p(x) is 
ensured by suggesting that the phase space pdf depends on value of energy function, e. Given this, we 
could suggest that the phase space density is /(£ ; (p(x), v 2 )), x, v, o^)- However, here we assume that 
the phase space is isotropic and encode this assumption by writing the phase space pdf as a function of 
energy alone, with no dependence on any other variables (see Section 3.1). In fact, we assume that phase 
space pdf is /(e), i.e. f(E(p(\\ x ||),|| v || 2 )) (by Equation 3.5). 

Now, there are no universally accepted parametric forms of the gravitational mass density of all 
matter in galaxies, known of in the astronomical literature. This urges a fully non-parametric model for 
p(|| x ||). Following D.Chakrabarty (2009), we adopt a fully discretised model such that the range of 
|| x || is binned, with || x ||e [|| Xj || — || <5x ||, || Xj ||) defining the i-th X-bin of width || <5x ||. We 
define that when || x || is in the i-th X-bin, p(\\ x ||) := pi, i = 1,2, . . . , N x and define the vector 
p = (pi, P2, ■ ■ ■ , PN X ) T - Thus, p is the discretised form of the unknown function p(|| x ||) that we seek 
to learn from the data. In light of this, we view the energy function as E(p, || v || 2 ) = e. 

In fact, there is again no guideline in the literature as to the nature of dependence of the phase space 
pdf on energy, motivating a non-parametric model for /(e) too. The relevant range of energy values 
is discretised such that e G [e-, — 5 e , €j) defines the j-th E'-bin of width S e , inside which, /(e) = /,-, 
j = 1, . . . , Ne- Thus, we set /(e) = /, for e in the j-th E'-bin, j = 1, . . . , Ne- We define the vector 
f := (/i, /2, . . . , fN E ) T - Indeed f is unknown and we attempt to learn it from the data. 

Our second worry is with the suggestion for the likelihood in Equation 3.6. We recall that the data 
D is partially missing as it does not comprise measurements of all 6 phase space coordinates; in fact 
D := {x\ , x 2 j v 3 }fc=i- Thus, in addition to the line-of-sight speed Vs, the measurables include the 
spatial locations Xi, X2 of the particle on the image of the galaxy. Let (Xi, X2, V^) T eMc W. The 
data is then sampled from the pdf v(x\,X2, ^3) of the subspace M. C W, so that likelihood should be 
defined in terms of v{x\, X2, ^3). In other words, the likelihood function cannot contain the unobservable 
variables X^,V\,V2- Therefore, the definition of likelihood in Equation 3.4 has to be modified and 
written in terms of the projected phase space pdf u(x\ ,x 2 ,w 3 ), where this projected pdf results 
from the projection of f(E) onto the space of the observables, M.. Assuming the observed vectors 
(x\ ,x 2 , V3 ) T , k = 1, . . . , Ntot, to be i.i.d, the likelihood function is then defined as 

Ntot 

(3-6) C(p 1 ,...,p Nx ,f 1 ,...,f NB \B) = l[u(x ( t\x i 2 k) ,vi k) ), 

k=l 

(k) (k) (k)\ 

where the projected phase space pdf for the k-th data point, v(x\ ,x 2 ,v^ '), results from projecting 
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the phase space pdf /(e) onto the subspace M., as in 

i/(4 ,4 >4 ) = f(e)dx 3 dvidv 2 , 

Jx 3 JVi Jv 2 

i.e. v{xf\x\\v\') = f(E(p(xY\x { 2 k \x 3 ),v 1 ,V2,v { 3 k ')dx 3 dvidv 2 , 

JX 3 JVi Jv 2 
(3.7) 

where this projected phase space pdf is normalised at each iteration by a constant given by integrating 
!/(■,■,-) over all possible values of the observables Vi,V 2 ,X 3 . It merits mention that E inside the integral 
is the energy function, not to be confused with expectation. In fact, 

e .— E{p{x t ,x 2 ,x 3 ),vi,v 2 ,v 3 ) 

= $ (V^FP + {4*°} 2 + W 2 ) \p (y/{x?} 2 + {4*°} 2 + W 2 ) + 

(38) (M 2 + M 2 + {vj k) } 2 ) 

— 

In the second step of Equation 3.8, elaboration of energy as sum of the potential and kinetic energies 
invokes the idea that the potential $(•) is determined via Poisson equation (Equation 3.3), given the 
gravitational mass density function p(-). 

The projection from W onto M in Equation 3.7 is performed by decomposing the integrals on the 
RHS as a sum of the contributions to the integrals, from individual E-bins. The mapping from the space 
W/M to the range of e in the j-th E'-bin, is identified for any k (see Chakrabarty and Portegies Zwart, 
2004, for details), j = 1, 2, . . . , N eng . Summing over all j's then provides u{x 1 ,x 2 ,v 3 ). 

Owing to lack of consensus in the literature on the nature of the gravitational mass densities in galax- 
ies and their phase space pdfs, we choose to impose uniform priors for p^ and fj, h = 1,...,N X , 

j = 1,..., N eng . These are respectively, vr (ph) = U\Pio ,Pm] and ^o(fj) = W[0, 1], . Here p^\p^ 
are experimentally chosen constants and the prior on the phase space density is uniform in [0,1] since 
fN eng is normalised to be 1 for the most bound orbit, i.e. for the maximum value of e . 

The priors are used along with the likelihood function (defined above in Equation 3.6) in Bayes rule 
to write the posterior probability density 

N x N E 

7Tl(pi, . . . , p Nx , fl, ■ ■ ■ , /jVbID) OC C(pi, . . . ,p Nx , /l, • • • , /JV B |D) X Yl Ko(ph) J^[ no(fj)- 

h=l j=l 

Above, the posterior density is referred to as ir\{p, f |D) to distinguish it from the posterior probabil- 
ity 7r(p, f |D) that we actually sample from, after convolving 7ri(-, -|-) with the distribution of error in 
the measurements of V3, f er ror(v 3 ). The measurement errors in X\ and X 2 are stated by astronomers to 
be small enough to be ignored. Inputs from astronomers involved in the observed data set are considered 
when modelling the error distribution. Typically, the error distribution is considered Gaussian with zero 



5 It will be seen in the presentation of our results that e is presented as normalised by its maximum possible value $0 := 
$(0). $(0) is determined by $(r) which in turn is determined by using the leamt p in Poisson equation (Equation 3.3). Thus, 
the normalised value energy e£ [0,1]. 
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mean and variance that is suggested by the relevant observational astronomer(s). Thus, 

7r(p,f|D) = 7Tl(>,f|D) * ferror{v S ). 

Then f and p are learnt by sampling from this posterior, using adaptive random-walk Metropolis- 
Hastings (Haario et al., 2006). Some of the approximations that underlie sampling from this posterior 
are now discussed, followed by a brief discussion of the inference. 

This is the approach used by Chakrabarty (2006); Chakrabarty and Portegies Zwart (2004) and 
Chakrabarty and Raychaudhury (2008) who assumed an isotropic phase space pdf. Now, the undertaken 
model assumption of phase space isotropy also allows for the identification of ^/xx. =|| x ||, with the 

spherical radius r, where r^ = y {x{ } 2 + {x 2 } 2 + {x^} 2 , the radial location of the k-th particle 
from the system centre. In other words, the assumption of an isotropic phase space is inclusive of a 
spherical spatial geometry. Then identifying $(•) with the gravitational potential energy , the connec- 
tion between <£(r) and p(r) (Poisson equation) is recalled as —^— ( r 2 — - — I = —A-nGpir) in this 

r z dr \ dr J 

geometry. Then the h-th X-bin discussed above is synonymous to the h-th radial bin, h = 1, . . . , N x . 
The unknown functions abide by the physically motivated constraints that p{r) > 0, /(e) > and 

do{ v i 

— - — < 0. The last constraint is intuitively motivated as valid in a gravitationally bound spherical 

dr 
system that we model the galaxy to be. We view the galaxy as being built by stacking spherical layers 

on top of each other. Then owing to gravity being a force that attracts mass towards the centre, the 

compactness in the packaging of mass is higher near the centre than away from the centre. In other 

words, p(r) increases as r decreases, in general (Binney and Tremaine, 1987). Following the trend of the 

various phase space pdfs discussed in astronomical literature (Binney and Tremaine, 1987), f(E) is also 

treated as a monotonically increasing function of energy E in this implementation of the methodology. 

In the adopted discrete model, a simultaneous learning of discretised versions of two univariate func- 
tions /(e) and p(r) is attempted, i.e. the target is to learn the vector p := (p±, ... , pn x ) T and the vector 
f := (/i, . . . , fN E ) T as proxies for the unknown functions. Given that the gravitational mass density 
is non-negative and monotonically non-increasing function of r, p G 1Z C K x . Again, given that 
/(e) >0,fG7cl^. 

In our implementation, to ensure monotonic decline in the gravitational mass density function, it is the 
difference A p between the gravitational mass densities in the h-th and h + 1-th radial bins that is pro- 
posed, as Ap ~ A/V(Ap , s 2 ), where Mf(c, d) is the folded normal distribution (Leone, Nottingham and Nelson, 
1961) with mean c and variance d, c > 0, d > 0. The motivation behind this choice of the proposal den- 
sity is that over the support M>o, it is a relatively easy density to sample from, while satisfying the 
requirement that in general, Pr(A p ) > when Ap = 0. Here, the current difference between the 
gravitational mass density values in the h and h + 1-th radial bins is A p := p^ — ph+i- The vari- 
ance s 2 is the empirical variance of A p , computed using values of this difference variable from step 
number to to t — 1, where t is the current step number and to is chosen experimentally to be post- 
burnin (Haario et al., 2006), h = 1, . . . ,N X . Here Pn x +i and p~N x +i are defined as 0. Then as h is 
varied from N x to 1, the proposed h-th component of the unknown gravitational mass density vector is 
Ph = Ph+i + A p . fj is updated similarly, while following an imposed constraint that f(E) is mono- 
tonically non-decreasing with energy E, j = 1, . . . , N eng . However, unlike in the case of the updating 
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of the gravitational mass density value, a monotonically non-decreasing phase space pdf with energy 
is not motivated by any physically justifiable constraints, though astronomical literature often suggests 
phase space density functions that typically increase with energy (Binney and Tremaine, 1987). 

Thus, we realise that when the data comprises even partially missing information on the state space 
coordinates of a system, this methodology allows for the formulation of the likelihood of the parameters 
describing the sought system function, in terms of the projection of the pdf of the system state space onto 
the space of the observables, as long as the unknown function can be embedded within the definition 
of this pdf. Such an approach is useful in contexts similar to what we work with here, for applications 
other than the current one. A notable feature of this methodology is that even when the data are missing, 
i.e. sampled from a sub-space of the system state space, it is possible to write the likelihood function as 
a product of the values of the sub-space density, at each data point, where the density of this sub-space 
is obtained by integrating the unobserved variables out of the state space pdf. 

3.3. Bayesian learning of model parameter vectors in real galaxy NGC 3379. NGC 3379 is one of 
the few elliptical galaxies, for which kinematic information is available for individual members of two 
different populations of galactic particles - referred to as planetary nebulae (PNe) and globular clusters 
(GCs) - over an extensive radial range spanning the outer parts of the galaxy. The data used in the 
work include measurements of X\, X2 and V3 of 164 PNe reported by Douglas et al. (2007) and of 29 
GCs by Bergond et al. (2006) 6 . We refer to the PNe data set as Di and the GC data set as D 2 , with 
respective sample sizes of iVi=164 and A^2=29. The learning of the model parameter vectors f andp is 
performed in a high-dimensional state space (dimensionality = N eng + N x ). The traces of the posterior 
given either data are presented in Figure 1. The marginal posterior densities given either data are shown 
for one component of the learnt p vector, namely rhoQ (Figure 3). The marginal densities are noted to 
be markedly multimodal. 

In Figure 2, we present the vectors pM := (p\ , . . . , p% ) T and fW := (/} , . . . , fff ) T , learnt 
from using the i-th data Dj, i = 1,2. This estimation is performed using the aforementioned Bayesian 
nonparametric methodology, under the model assumption of an isotropic phase space pdf , i.e. a phase 
space density that is expressed as /(e) and approximated in the discretised model of CHASSIS as the 
vector f , (the j-th component of which is the value of the phase space density in the j-th energy-bin) and 
the vector p, (the h-th component of which is the value of the phase space density in the h-th radial-bin). 
The learnt 95% HPDs are represented as error bars and the modal values are shown as open circles. 

4. Testing for isotropy. The apparently inconsistent p learnt from two different kinematic data sets 
of two different types of galactic particles in a real galaxy, (Figure 2), translates to multiple gravitational 
potential estimates for the same galaxy, which is of course not physically plausible. Thus, the total 
gravitational mass in a galaxy can be misidentified, leading to erroneous estimation of the fraction of 
the galactic gravitational mass that is contributed to by dark matter which if applicable to a sample of 
galaxies, can in turn bias cosmological ideas about the distribution of dark matter on galactic scales. 

Thus, we have identified the risk in the implementation of kinematic data in learning galactic gravi- 
tational potential. In appreciation of this serious risk, we need to check 

-if the problem is inherent to the very notion of attempting the estimation of gravitational mass density 
using the available data on phase space coordinates of sampled galactic particles, or 



6 We use the velocities of only 29 of all the GCs that Bergond et al. (2006) advance, as these are identified as unambiguously 
bound to the gravitational field of NGC 3379, as distinguished by others that might be in the shared by the fields of this galaxy 
and its neighbours. 



12 



CHAKRABARTY 



Table 1 
Table displaying seeds that individual chains run with data Di and D2 are started with. The initial choice of the 



gravitational mass density function is one that is sometimes used in astrophysical literature, 



f><> 



1 + 



The 



starting phase space density is chosen to be either of the form exp e or e . Here the parameters po, cti, 02, T c , (3 G K>o- The 
chains run with PNe data Di are assigned names characterised by the prefix "PNe-RUN", while runs performed with the GC 

data D2 are labelled with prefix "GC-RUN". 



Name 


Po 


r c (kpc) 
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10 


1.8 
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Fig 1. Logarithm of the joint posterior probability density of model parameter vectors f and p given the two sets 
of real data - 7r(f, p|Di) on the right and 7r(f , p|Di) on the left. 
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Fig 2. Right panel: density profiles recovered from the chains run with real data T>i (data on PNe) and D 2 (data on 
GCs). The p 1 learnt from the chains run with PNe data Di are shown in red, yellow and blue while p 2 learnt with 
the GC data D 2 kinematics are in black, magenta and cyan. As is apparent, p vectors learnt using distinct data 
sets are not consistent with each other. Left panel: isotropic (normalised), phase space density vector $i recovered 
from PNe — RUN I (in red) and f 2 from GC — RUN I (in black), plotted in corresponding energy-bins. 
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Fig 3. Marginal posterior probability density of the 6-th component of the p vector learnt using observed data Di 
(left) and D 2 (right). The plots manifest multimodality of the marginal posterior densities of the 6-th component 
of the model parameter vector p. 
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-if there is a problem in our Bayesian methodology discussed above, or 

-if the data implemented in this example are biased to produce the recovered inconsistency between pW 
and p( 2 ) learnt using the 2 data sets Di and D2 respectively. 

Here we keep in mind the possibility that these above factors may not necessarily be mutually indepen- 
dent. 

We begin by examining if the assumption of an isotropic phase space that was used in the above 
Bayesian methodology can result in the recovered inconsistency in the p learnt using the 2 data sets. This 
question is addressed by testing for the support in each of the used data sets, towards the assumption of 
an isotropic phase space, using a new test of hypothesis that we present here. 

As there are two data sets - the relative support in which towards isotropic models are sought - we 
define two null hypotheses, such that the i-th null, Hi, proposes that the i-th data set D, is drawn from 
an isotropic phase space pdf ^i(e), i = 1, 2. Here \Pj(e) is some real-valued function of the normalised 
energy e G [0, 1], such that ^(e) > if e < and SS?i(E) = otherwise. We refer to a general 
isotropic phase space pdf as dependent solely on energy e (see Equation 3.8) since we have established 
in Section 3. 1 that isotropic phase space pdfs are functions energy alone. Then under the null that D^ 
is sampled from ^(e), ^i(e) = fj, where f; the phase space pdf learnt using data Dj in the isotropy- 
assuming Bayesian methodology CHASSIS. Thus, 

(4.1) H i :t i = V i (e), i = l,2, 

for each i = 1,2. 

Here we are not making any pre-emptive assumptions about the concurrence of the phase space 
densities that the two data sets are drawn from. If of course /i(x, v) and /2(x, v) are isotropic and 
coincide with each other, then Pr(iJi|Di) and Pr(i?2|D2) wm be similar and both Pr(iJi|Di) and 
Pr(#2|D2) will be high. If /i(x, v) and /2(x, v) coincide but neither is an isotropic function of x and 
v, Pr(#i|Di) and Pr(F 2 |D 2 ) will be similar but neither Pr(#i|Di) nor Pr(iJ 2 |D 2 ) will be high. If 
/i(x, v) and /2(x, v) do not coincide but are both isotropic, Pr(i7i|Di) and Pr(i?2|D2) W1 U both be 
high. If f\ (x, v) and /^(x, v) do not coincide and only one is an isotropic function, it is likely that either 
Pr(iJi|Di) or Pr(i?2|D2) will be high and the other low. Given this framework, we elaborate on the 
motivation for a new test, developed to work in this context of Bayesian non-parametric learning of the 
unknown functions, subsequent to an examination of the priors available for the hypotheses as well as 
for the model parameters. 

4.1. Handling priors. We first test for H t as supported by Dj; conventionally, this is done by com- 
puting the posterior odds Pr(iJj|Dj)/[l - Pr(ifj|D;)] (Goodman, 1999; Kass and Raftery, 1995). It 
then appears that we should be comparing the posterior odds of the isotropy-assuming model and the 
model that does not assume isotropy, once given the data set Di and then D2. However, as men- 
tioned in Section 3.1, in this application the posterior probability of model parameter vectors given 
either data set, under the anisotropic model is unachievable. Hence we compare Pr(i/i|Di)/[l — 
Pr(#i|Di)] and Pr(#2|D2)/[l — Pr(#2|D2)] instead. This would entail computation of the Bayes 
factors Pr(Di|#i)/[l-Pr(Di|ifi)] and Pr(D 2 |# 2 )/[l-Pr(D2|# 2 )] and the prior odds Pr(fli)/[1- 
Pr(fli)] and Pr(JT 3 )/[l - Pr(iT 2 )]. 

There is no apriori physical reason to constrain the prior Pr(i^i) as equal to ¥r{H<i), In allowing 
for such flexibility, we are allowing for the system phase space structure to be characterised by at least 
two volumes, the state of isotropy in which are not necessarily the same. Since in complex dynamical 
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systems phase spaces characterised by such insulated pockets are possible, as arising out of non-linear 
dynamical mechanisms, we do not ignore such a possibility. 

Next we ask if there is anything in the astronomical literature to suggest the value of Pr(i7j)/[1 — 
Pr(ilj)] i = 1,2 that is needed in the computation of the Bayes factors. Model-driven numerical sim- 
ulations of galactic phase spaces have been undertaken (Dekel et al., 2005), though it is precisely be- 
cause consensus over the status of isotropy in individual real galaxies has not been achieved by such 
numerical simulations, that the aforementioned mass-anisotropy degeneracy continues to worry the as- 
tronomy community. In contrast to such simulations, Cote et al. (2001) discuss isotropy in phase spaces 
that observed samples of GCs are drawn from, using data from multiple sources that are available for 
the galactic system in question. However, extrapolating these suggestions to the particular instance of 
the galaxy of interest in our work (NGC 3379) is risky since the details of the non-linear dynamics 
responsible for designing the phase space of a galaxy, is particular to the evolution (and structure) of 
that galaxy. Moreover, the data from the multiple sources that Cote et al. (2001) used is not available 
for NGC 3379 and the data for this galaxy includes measurements of PNe that live in this galaxy, not 
just GCs that Cote et al. (2001) considered. Douglas et al. (2007) do indeed discuss the issues regard- 
ing isotropy in the phase space that the observed sample of PNe in the galaxy NGC 3379 are drawn 
from, though they do not provide estimates of the probability of this phase space being isotropic. In 
other words, the work by Douglas et al. (2007) cannot be employed to constrain Pt(Hi). The generic 
simulations of Dekel et al. (2005), will have to be considered to be applicable to the case of this par- 
ticular galaxy, to suggest Yi{H\)/l — Pr(.ffi). However, it is important to note that in their numerical 
modelling, Dekel et al. (2005) consider the galactic phase space to be a monolithic structure, i.e. they 
do not consider the possibility that distinct samples of different types of galactic particles could live in 
insulated volumes of the galactic phase space. Thus, using these simulation-based studies as suggestions 
for priors do not allow for general models. 

The situation here is typical of real-life cases when little information is available towards the prior 
odds. Thus, we can only suggest Pr(iTj) = 1 — Pr(ifj)=0.5, so that the prior odds is unity. 

4.2. Difficulty in computing Bayes factors here. In the implementation of the posterior odds Pr(i7j|Dj)/[l- 
Pr(ifi|Dj)], i=l,2, the computation of the Bayes factors is rendered difficult in light of improper priors 
on the system parameter vector (f W, p^') T . Of course, this is not relevant to applications where priors 

are not improper. Here, priors no{p]l) and 7To(f .• ) are uniform - but over two distinct intervals in values 
of the state space parameter vector; j = 1, . . . , N eng , h = 1, . . . , N x . 

Various resolutions of this problem of improper priors has been suggested, including posterior Bayes 
factors (Aitkin, 1991), intrinsic Bayes factors (Berger and Pericchi, 1996a,b) and fractional Bayes fac- 
tors (O'Hagan, 1995). In line with the discussion in Section 1, the precise difficulties in translating these 
suggestion to the application at hand are enumerated below. 

(i) The Bayesian learning of (f W, p®) T , as delineated above, is possible only if the data distribution 
is such that there is at least one measured value of vs available in each of the N x bins that the 
chosen interval of r is binned into, otherwise the method wrongly interpolates linearly between 
two neighbouring bins belying the non-linearity inherent in the system, 
(ii) However, this worry might appear mitigated in light of the suggestion that the training data 
sets are not required to be real data but can be imaginaiy (Berger and Pericchi, 2001, 2004; 

Cano and Salmeron, 2013; Fouskakis, Ntzoufrasy and Draper, 2012; Perez and Berger, 2002). Thus, 

(£) (£) 

Dj can be sampled from the posterior predictive distribution Pr(D t - |Dj) under the null, i.e. 
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from / Pr(vf ) \f^\p®)7r(fjf\p®\T> i )dp£ ) d$\ (Here we adopt the notation that model pa- 
rameter vectors with subscript "1" refer to those learnt under the more complex model, i.e. the 

alternative, while those with subscript "0" refer to vectors learnt under the null). The training data 

(£) 
D^ thus generated, could then be used to train the prior under the complex model (the model 

in which the simple model or the null is nested), to be then input as the prior in the computation 

of the predictive distribution of the i-th data set under the complex model. In other words, we 

then compute the prior under the complex model as Pr(fj , p\ ) = / 7r(fj , p\ |D^ )dD\ 

and subsequently input this to compute the predictive distribution under the complex model, 

mi(Df } ) = f Pr(Df } |ff \ pf) Pr(ff ) , pf)d^ ] 'dpf , which would be invoked in the Bayes 

factor. However, in our work, it is not possible to compute the posterior probability of the model 
parameter vectors under the complex model that does not assume isotropy but incorporates a gen- 
eral anisotropic phase space pdf within its definition. In other words, the posterior probability of 

(i) (i) 

f{ and p\ though perceived, is not computable under a general anisotropic phase space pdf (see 
Section 3.1). 
(iii) This shortcoming notwithstanding, imaginary training data can in principle be implemented to 
give the posterior odds of the null to its complement, given a data set. However, the computational 
intricacy involved in averaging over all possible imaginary samples is formidable - such averag- 
ing is required to compute the "expected-posterior prior" of the model parameter vectors under 

the null, i.e. / ir{$\pf\T)f ) )ma{T)f ) )dI)f ) (Fouskakis, Ntzoufrasy and Draper, 2012). We 

would then need to generate a large sample of training data sets, and for each these training 
data sets, we would need to learn the unknown model parameter vector under the assumption of 
isotropy. This suggests running as many long MCMC chains to convergence, as there are train- 
ing data sets that the posterior of the model parameter vectors for the null, is averaged over, in 
the numerical computation of the expected-posterior prior under the null. This is required to be 
a large number, if the expected non-linearity in the joint posterior probability of all the unknown 
parameters is to be adequately explored. Given such a computationally intensive method, we seek 
an new method that is numerically less cost intensive. 

In view of this, we present a new distribution-free test of hypothesis that works by taking the full 
posterior structure of the model parameters into account. This test is inspired by the Fully Bayesian Sig- 
nificance Test (FBST) advanced by de B. Pereira and Stern (1999); de B. Pereira, Stern and Wechsler 
(2008). 

5. Method. In our test, a measure of the support in the data Dj for the null Hi, i.e. Pr(i^j|Dj) 
is given by the complement of the integral of the posterior probability density of the model parameter 
vector (f W , p^') T given data Dj, over all those parameters, the posterior of which exceeds the maximal 
posterior under the null. In fact, it is a rephrasing of this integral that allows us to compute Pr(i^j|Dj) 
in this context of non-parametric multimodal inference in high-dimensions. 

In this section we first describe FBST, then introduce this new test and then proceed to describe its 
adaptation for our application. 

5.1. FBST. The Fully Bayesian Significance Test (FBST) presented by de B. Pereira, Stern and Wechsler 
(2008), tests the sharp null hypothesis that the relevant parameter, 9, has a value 9q, i.e. Hq : = 6q. 
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Here 9 is assumed to be distributed continuously in the parameter space 0. In FBST, the evidence in 

favour of Hq is measured by first identifying the parameter 9* for which posterior probability under the 

null, given data D, is a maximum, i.e. 9* = argmaxPr(0|D). de B. Pereira, Stern and Wechsler (2008) 

Ho 
suggest the evidence in favour of Hq to be: 

(5.1) ev = l-Pr(0GT|D), where 

(5.2) T = {9 : Pr(0|D) > Pr(0*|D)}, 

The intuitive justification of the above is the following. Higher is the probability of achieving 9 that 
is more consistent with the observed data than is maximally achieved under the null, less is the sup- 
port in this data for the null; such 9 live in the set T by definition (Equation 5.2). Thus, higher is the 
probability of identifying 9 G T, the less is the support in D for Hq. Then FBST quantifies support in 
the data against the null hypothesis as the integral of the posterior probabilities of 9, for 9 G T. Thus, 
FBST involves identification of 9* via optimisation, followed by integration over T. One aspect of this 
test is that the measure of evidence in favour of the null obeys the Likelihood Principle (Basu, 1975; 
Berger and Wolpert, 208; Birnbaum, 1962). 

From this discussion, it may appear that the way the evidence value ev is defined above, it is not invari- 
ant to re-parametrisation of the parameter space. This problem has been addressed by Madruga, Pereira and Stern 
(2003) and de B. Pereira, Stern and Wechsler (2008) via the suggestion that the formulation of the ev 

Pr(0|D) 

be in terms of the surprise function s(-) relative to a reference density r(-), where s(9) := — — , 

r(9) 

where the reference density on 9 is r : — > R. The purpose of the normalisation of the posterior 

of 9 given the data, by the reference density function r{9) is to ensure that for any transformation of 

0, such as w = H(0), the supremum of the transformed surprise function remains invariant, i.e. the 

measure of evidence in favour of the null, ev, is rendered invariant to re-parametrisation (shown by 

Madruga, Pereira and Stern, 2003, for a bijective and continuously differentiable H(-)). In this revised 

interpretation, the FBST procedure involves identifying 9* = argmaxs(0) and the evidence in favour 

Ho 

of the null is 

ev = 1 - Pr(0 G T|D), where 

(5.3) f = {9 : s(0) > s(9*)}, 

de B. Pereira, Stern and Wechsler (2008); Madruga, Pereira and Stern (2003) suggest possible interpre- 
tations of r(9). If r(9) is treated as the uniform density U(9) in FBST, this effectively equates ev to ev 
(Equation 5.2). 

5.2. The new test. In the non-parametric test of hypothesis that is presented here, the support in the 
data Dj for the null hypothesis Hi is given as 



(5.4) 



Pr(Hj|Dj) = 1- / 7r(0|Di)d0, where 

(5.5) r, = ;, : ^P!) >'*(•*!».: 



r(0) r(0) 

i = 1,2, and 7r(0|Dj) is the posterior probability density of the model parameter vector = 

(/l,/2, • • ■,fN eng ,Pi,P2, ■ ■ -,PN X ) T given data Dj. ir Hi (9\^>i) is the posterior probability density of 
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9 under the null Hi. To ensure invariance to reparametrisation of the parameter space, we define the 

set Ti in terms of the surprise function s(9) := l , where we choose to work with a reference 

density r(9) that is uniform in 9. Here T is defined in Equation 5.5 as the set of those values of the 
parameter 9, the posteriors of which exceed the maximal posterior achieved under the null Hi. Thus, T{ 
consists of those parameter values that are even more consistent with data Dj than is 9*, which is the 

parameter that maximises 



most consistent value under the null. In other words, the value of the mode 
the posterior under the null, 7r^(0|Dj), is identified as 9*, i.e. 9* = arg 

identified 9*, we partition the parameter space 0j into the subspaces Tj anc 
in Equation 5.5. Thus, 



max 7177. (# ID 



Having 
Ti, where Ti is defined as 



@i = TU Ti 
T i = {e:Tr(e\B i )>ir Hi (0*\Vi)} <=*► % = {9 : tt(0|Dj) < ir Hi (9*\Di)} 

We realise that larger is the proportion of parameter values that are more consistent with the data 
than what is maximally achievable under the null, the less is the support in the data for the null. Here, 
"consistency with the data" is measured by the posterior probability density of parameter vector given 
data Dj. Then posterior probability density of all 6 G T given Dj contribute to support in Dj against the 
null. In other words, 7r(0|Dj) integrated over T defines total support against the null as 1 — Pr(#j|Dj). 
This gives Equation 5.5. So, 



(5.6) 



1 - Pr(fli|Di) = / Tr(9\Bi)de or 

Pr(Hi\T>i) = 1 - I jC(9\T>i)TT (9)d9 



IT, 

Thus, unlike Bayes factors - the computation of which involves integrating over the parameter space - 
this test involves integrating over the subspace Tj. 

Now, the posterior probability density of 9 given Dj, integrated over the subspace T, is the probabil- 
ity of 9 to live in this subspace, given the data, i.e. 

(5.7) Pr(0 G Ti\Bi) = I ^(0|D 4 )d0 = 1 - Pr(# 4 |D,). 

Jt, 

Then the integral of 7r(0|Dj) over subspace T is approximated by the proportion of 9 G Tf, this pro- 
portion is determined empirically. This crucially eases the computational burden in the context of this 
non-parametric inference in high-dimensions. 

5.3. Motivation for a Bayesian test. Could a frequentist test have worked for our purpose? That 
would require us to define a test statistic S and to aim at finding the probability that S is more or 
equally unlikely than the value of this statistic, given the data. A possible candidate for S could be 
S := l/£(/3|Dj) where £(/3|Dj) is the likelihood of the "relevant parameters" /3 given data Dj. Such 
a choice of S is motivated to ultimately allow us to reject or not reject (rather than accept or not accept) 
the null within the paradigm of a non-parametric frequentist test. Now [3, the "parameter of relevance" 
to this null must measure how isotropic the phase space pdf is, given that the null states: f; leamt us- 
ing D, is isotropic (Equation 4.1). The concept of a parametrising how isotropic (or anisotropic) the 
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phase space pdf is, exists in the astronomy literature in the form of the anisotropy parameter (discussed 
above in Section 3.1). However, in general there is no measurement available that allows for the quan- 
tification of this anisotropy parameter (Binney and Tremaine, 1987), leading to the worry referred to as 
the mass-anisotropy degeneracy (Cote et al., 2001, 2003; Koopmans, 2006; Lokas and Mamon, 2003). 
In fact, in lieu of such information, in the community, the anisotropy parameter is arbitrarily assigned 
values, and for each such assignment, a new phase space pdf (and p(r)) is obtained from the data (see 
Binney and Tremaine, 1987; Douglas et al., 2007, for examples of this). In other words, £(/3|Dj) can- 
not be achieved. Even for other prescriptions of the test statistic S, it is bound to bear dependence on 
how likely the parameters of relevance are, given the data. However, the computation of this likelihood 
is impossible given the nature of the available information, i.e. S is fundamentally not achievable in this 
application. Above, % = 1, 2. 

In lieu of a way to compute the likelihood of (3, we compute the probability of 7r(0|Dj) < ir^ (0*|D$), 
(Equation 5.6) to subsequently compute Pr(ifj|Dj) (Equation 5.7). Thus, in this framework, we can 
work in terms of the model parameter vector 9 rather than the parameter (3 that measures anisotropy 
of the phase space. Attempting to compare the likelihood of 9 given Dj, with the maximum value of 
this likelihood under the null, is synonymous to comparing Pr(Dj|0) with the maximum value of this 
probability under the null. However, by definition, the likelihood considers variation in data given a 9 
while we need to find the probability of the null given data set, i.e. such a comparison cannot lead to 
Pr(i7o|Dj). Thus, one needs to work in the Bayesian framework. 

Thus, we prefer to work in parameter space and at the same time, given the radically different sample 
sizes of the GC and PNe data, it is imperative that the test be independent of sample-size effects. We 
recall that the sample size of the PNe data is about 5.6 times the size of the GC data. 

For the above reasons, a (non-parametric) frequentist test cannot be useful in this application. 

5.4. Implementation of the new test. In contrast to the parametric FBST discussed by de B. Pereira, Stern and Wee 
(2008), in our work, the the posterior of the system variable, given the data, does not conform to any 
parametric form. Also, to render implementation possible in this context, we avoid a direct computation 
of the integral of the posterior probability density of the model parameter vector given the data at hand. 
Given this, we present our test below. 

The null hypothesis Hi that we aim to test for, is given above in Equation 4.1, i = 1,2. Both 
H\ and H2 represent sharp hypotheses. In our work, the model parameter vector variable is 9 = 
(/1, f 2 , . . . , fN eng ,Pi,P2, ■■■,PN X ) T G T x TZ. For fW and pW learnt using the i-th data set Dj, 
we define 

0«:=((f«) T ,(p«) T ) T z = 1,2. 

5.4.1. Identification of posterior-maximising model parameter vector, under the null. With the aim 
of identifying the subspace Tj, (i = 1,2), we identify the 9"' vector that maximises the posterior 
[fw, pM|Dj] under the null, given the data. This posterior maximising, null abiding vector is referred 
to as fl' 1 *'. In order to identify this vector, the following construct is used. 

• During the posterior inference on 9^' , performed with adaptive Metropolis-Hastings, let the cur- 

(i) 

rent state vector be 9- , in the j-th iteration, j = 1, . . . , Nq, where the chain is Nq steps long. 
Upon convergence, the unknown f W and pW are leamt within 95% HPD credible regions. From 
a given chain, the model parameter vector 9^ Ml > := ((f ( M *)) T ; (p( Ml )) T ) T corresponding to the 
mode of the posterior [f W ; pM |Dj] is identified. 
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• Then from the identified f ( Ml > , at the identified p( M l > , we simulate an N{ -sized data set of the ob- 
servable variables, i.e. X\, X 2 and V 3 . Let this generated data set be Dj gen := {(X^ gen ,X^ ' gen , V^ gen )}^Lv 

• Importantly, we understand that by construction, this data set is sampled from an isotropic phase 
space density since f( Mi ) was learnt under the model assumption of isotropy. This data set is 
simulated from f ( Mi ) at p( Ml > using rejection sampling, according to the following algorithm. 

—GM(r) 

1. We solve Poisson equation (Equation 3.3) in a spherical geometry to get, <3?(r) = — 

r 

where M (r) = / 4irp(s)s 2 ds and G is a known constant. Then discretising this integral, 

Js=0 



we define 

M(r) = ^2^-[q 3 S 3 -(q-lf5 3 } Pq + ^-[r 3 -p 3 S 3 }p p+1 for r e \p5, (p + 1)S), 



47T r , o , „,„,, 47T, 



N ' Air 
M(r) = ^y[g 3 ,5 3 -(g-l) 3 5 3 K for r > N x 5, 

g=l 

A/TT 

M(r) = — [r 3 5 3 ]pi for < r < 5, 
(5.8) 

where 1 < p < iV^ — 1. Here N x 5 is the maximum radius to which data are available and 
p q is the gravitational mass density in the q-fh radial bin, i.e. p{r) = p q if r G [((/ — 1)5, g5), 
g = 1, . . . ,N X (introduced in Section 3.2). This defines $(r) for any r > 0, given the 
identified p( Ml \ For the p( A/j ) learnt using D^ s , we compute $(r) as above. 

2. At this step, we sample the normalised value of the energy function, e defined in Section 3.2. 
As this normalised value G [0, 1], we choose e randomly from U[0, 1], where Here U[a, b] 
is the uniform distribution over the range [a, b], for any a, b € M.. Let the chosen e be such 
that it lies in the t-th energy bin, i.e. e G [(t — 1)<5e, £<5e], t = 1, 2, . . . , iV eng . Over this t-th 
energy bin, let the component of f( Ml ) be f\ . 

3. The 3 components of the velocity vector, V\, V 2 , V3 are continuous in [— yj— 2$(0), a/— 2$(0)], 
while the components of the spatial vector are continuous in [—N x 5, N x 5] . We set, X\ , X2 , X3 ~ 
W[-^, iV*«5] and Vi, V 2 , V 3 ~ W[-V"2$(0), V" 2$ (0)]- 

4. Draw values of X, X2, X3 from W[— Af x <5, N X S]. Using these chosen values x\, x 2 , X3, we 

obtain the value of the spherical radius r = \J x\ + x\ + x\. Let r be such that it lies in the 

q-fh radial bin, i.e. r £ [(q — 1)5, qS], q = 1, 2, . . . , N x . For this chosen r, we then compute 

/ n / x / n -GM(r) 

<J>(r) using M(r) from Equation 5.8 and the definition $(r) = . We normalise 

r 
3>(r) by $(0), so that $(r) now lives in the range [0, 1]. 

5. Check if the chosen e > 3>(r). If not, go back to step number 2. If yes, then define the 
magnitude of the velocity vector V as a/[2(e — <£(r))], where V = (Vi, V2, V3) . Now, 
Vi, V 2 , V3 ~ £/[— -y/— 23>(0), y/— 2$(0)]. So we draw v\, v 2 , v 3 individually from this uni- 
form distribution. 

6. In this step, we sample from / t using rejection sampling. Here the chosen e is in the 
t-th energy -bin so that / f is the value of /(e) in our discretised model for the chosen 
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AMi) 
e. The rejection sampling is done by checking if t > u or not, where u is a random 

number in [0, 1], u ~ U[0, 1]. Here g(e) is the proposal density function that is chosen to 
envelope over /(e), Ve, and is defined as g(e) = 1.05Ve. This is an adequate choice because 
the phase space pdf /(e) is normalised to be in [0, 1]. If the above inequality holds, we allow 
an integer- valued flag, 7, an increment of 1 and accept the values xi, X2 and V3 as chosen in 
steps (4) and (5) respectively, as the 7-th line in D- f cn> . We iterate over points 2 to 6, until 
7 equals iVj. 

• We input D- f to CHASSIS, to start a new chain. Post burn-in, samples of 9^> vectors generated 
in each iterative step are recorded. It is to be noted that any such recorded 9^' abides by the 
null hypothesis Hi since it is recovered using the input data set D> 9 en ' that is drawn from an 
isotropic phase space density function - one that is leamt from the observed data set Dj under the 
assumption that the phase space is isotropic. In this recorded sample of vales of 9^> , that which 
maximises the posterior [f W ; pW |D,- 9 ], is the null-abiding, posterior-maximising parameter 

0("):=((f(»)) r ,(p(»)) T )) r 

5.4.2. Probability of membership in subspace Tj. We need to identify the sub-space T$ in which live 
model parameter vectors, the posterior of which exceeds the posterior probability density ir(9^*> \Y)f en ). 
We are required to integrate the posterior probability density of such model parameter vectors, over all 
parameters that live in the subspace Tj. Thus, this integral is equal to Pr(0 G Tj|Dj). Thus, implemen- 
tation is possible in this instance of non-parametric, multimodal inference in a high-dimensional state 
space, by approximating this probability of membership in Tj with a case counting scheme in which, 
probability of the membership of 9 in Tj computed, conditioned on the observed data Dj. This is the 
proportion of the model parameter vectors for which 7r(0|Dj) > tt(9^*'\'D\ 9 en ), as recovered in the 
post-burnin stage of chains run with real data Dj. 

Thus, let there be a total of Ai G Z + number of samples of 9 vectors recovered in the post- 
burnin stage in these chains run with data Dj. Out of these, let Bi number of 9 vectors be such that 
7r(0 (i) |Dj) > it{9 { - ii ' ) \D l f en) ). Here, B G Z+, B t < A t . Then the fraction Bi/Ai is an approximation 
to the probability that 9 G Tj, given Dj. Then recalling Equation 5.7, we state 

(5.9) Pr(tfj|D,) = 1 - |i, 

£=1,2. We compute Pr(ff 1 |D 1 ) and Pr(if 2 |D 2 ), and compare the two. 

6. Support for the assumption of isotropy of synthetic phase spaces that data sets are simulated 
from. In this section, we implement the new test to quantify support for the assumption of isotropy, in 
data that have been sampled from synthetic phase spaces. We choose an isotropic phase space density 
function /j so ( x ) v ) an d an anisotropic phase space pdf / an j so (x, v), and sample 2 data sets Dj so and 
Daniso from these phase space density functions, respectively. We choose the sample size of D an j so to 
be 54 while the sample size of Dj so is chosen to be 270, i.e. 5 times that of D an j so . These numbers 
mimic the ratio of sample sizes of the real data - ratio of size of Di to that of D 2 is about 5.6. The 
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chosen parametric forms of the phase space density functions are 

(6-1) /.o(x,v) = ^exp(^L), 



/™(x,v) = -^exp^exp^-H^j 



v 2 /2 + $(r) 
where e = , L = x A v. 

Here, we choose the anisotropy scale r a = 4 in units of kpc (the astronomical unit of length at galactic 
scales) and a = 220 in units of km s _1 . The motivation for these numerical values is not strong; 
familiarity with length and velocity scales similar to those in the Milky Way is the only guiding factor. 
The chosen form of 3>(r) is maintained as same in both definitions of the synthetic phase space pdfs, as 
that of the Plummer model (Binney and Tremaine, 1987), 

(6.2) *(r) = GM ° 



where we chose the parameters to be Mq = 4 x 10 11 times the mass of the Sun and r c =8 kpc. It may be 
noted that Mo and r c represent the total gravitational mass of the synthetic system and the scale length 
of the Plummer model. Again, choices for the numerical values of Mq and r c are motivated by nothing 
more than familiarity with such values that describe our galaxy. G is a known physical constant, the 
universal gravitational constant. 

Having defined the true model, we then simulated Dj so and D an j so from respective phase space 
densities, where each data set contained information on X\, X2 and V3. The sampled V3 data was 
chosen to be characterised by Gaussian noise ~ A/"(0, 20 2 ) which is typical of real-life galaxies that are 
nearby (Douglas et al., 2007). The model parameter vectors p and f learnt by porting T)i so and D an j so 
to the algorithm CHASSIS is shown in black in the right and left panels respectively, of Figure 4. 
The modal values of the learnt components of these model parameter vectors are displayed with open 
circles while the error-bars in the plots depict the 95 % HPDs. Given the lack of information that typifies 



inference in real-life galaxies, here we used uniform priors TTo(ph) = —7^ an d ^o(fj) = 1> 



1 
p^ Uj [10 2 -10' 



where p h is the h-th component of the seed value of the gravitational mass density vector. For the 
implementation of Dj so , we used h = 1, . . . , 19, j = 1, . . . , 9 and for the implementation of D an j so , we 
used h = 1, . . . , 14, j = 1, . . . , 9. 

Thus, the unknowns are leamt under the assumption of phase space isotropy, using data - one of which 
is, and one of which is not sampled from an isotropic phase space density. We refer to the vectors learnt 
using T>i so as p iso and fj so and those learnt using D an j so as p aniso and f a niso- I n order to check for the 
support in these 2 data sets for the assumption of phase space isotropy, we first seek the model parameter 
vector pair fM , p( % *> that maximises the posterior under the null, given the generated data set D 4 . 
Here D 4 is generated by sampling from the modal parameter vectors f( A/j ) at p( Ml \ where i stands 
for "iso" or "aniso" (see Section 5.4 for details of this sampling). Here the null Hi : f W = ^j(e). 

In Figure 5, the logarithm of the posterior probability density 7r(f^ so \ p ( - tso >\'Di so ) and 
^^(amso) ^ pK amso )\Y) aniso } aj- e shown in black in the right and left panels respectively. The maximal 

posterior probability density under the null in the two cases are 7r(f(* so ), p iSO )|D^ ) and 
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Table 2 
Table displaying support in synthetic data T)i ao and Haniso (simulated respectively from known isotropic and anisotropic 

parametric phase space density functions), for the null that the data is sampled from an isotropic phase space density. 

Column 2 shows the ratio Bi/Ai, i.e. the fractional number of iterations in which posterior probability density off^' and 

p^\ given T>i exceeds the maximal posterior probability achieved under the null, given generated data Y)\ 9en ' that has been 

sampled from the modal V- l ' learnt using Di. Here i stands for "iso" or "aniso". Here, the fraction is taken out of the 
total number of iterations in the chain, post-burnin. Column 3 delineates the probability of the i-th null conditional on the i-th 

synthetic data. 



Simulated data D\* ) 


A, IB, 


Pr(Hi|Di) 


-L'iso 
-L' aniso 



1 


1 




^f(amso) p(amso)|j}U?enj^ arg shown in red, in the right and left panels respectively. Here i stands 
for "iso" or "aniso". The fractional number of MCMC samples ((fW) T , {p^) T ) T in the chain run 
with Dj, with posterior probability density values that exceed -k{^ iso \ p tso '\D^ n ), is an empirical 
approximation of the probability for ((fW) T , (p^) T ) T to live in subspace Tj. The complement of this 
probability gives Pr(i/;|Dj) (see Equation 5.7). 

The computed support in Dj for the null Hi is tabulated in Table 2. As is expected, for the data set 
D an j so that is simulated from an isotropy-defying phase space density, the support in the data for the 
null that the data is sampled from an isotropy-assuming phase space pdf, is 0. On the other hand, when 
the implemented data Dj so is truly sampled from an isotropic distribution, the method recovers a support 
in the data for the null that is 1 . That the inference in being performed in a multimodal setting is brought 
out in Figure 6 that depicts the multimodal nature of the marginal posterior of the example parameter 
Pq, learnt using the data set Y) an i so . 

7. Support for the assumption of isotropic phase space of a real galaxy. To present the quan- 
tification of the support in the real, observed data sets Di and D2 each, towards the assumption of an 
isotropic phase space pdf, we recall the results of the Bayesian learning of the model parameters using 
these data sets. These results were presented in Section 3.3. 

Our new test, as described above, is invoked to estimate if the assumption of isotropy is supported, in 
the two different particle samples that we deal with. 

The computed support in the two data sets Di and D2 to have been sampled from isotropic phase 
space pdfs are Pr(i//i|Di) and Pr(i^2|D2) respectively, are represented in Table 3. 

The details of the computational and inferential procedure are as follows. The chains are at least 
400,000 iterations long, when data Dj is used to learn pW and fW, i = 1,2, under the assumption of 
phase space isotropy, in the Bayesian non-parametric method CHASSIS. Uniform priors on f and p are 
used. A sample D> fle , was drawn from the learnt f( Ml ) at the learnt p( Ml \ with i=l,2. The chain runs 
with these generated data D \ , is the same length as the original chains. 

Comparing the computed Pr(i^i|Di) and Pr(f/2|D2) across the chains implies that the assumption 
of isotropy is more likely to be invalid for the phase space from which the PNe data Di are drawn 
than from which the GC data D2 are drawn. Basically, support in real data Di for the assumption of 
isotropy is distinct from that in D2. This implies that the fi(x, v) 7^ f2(x, v), where phase space pdf 
that Di is sampled from is f\{x, v) and D2 ~ f'2{x, v). However, both data sets carry information on 
the phase space coordinates (X T , V T ) T of the same galaxy, i.e. both data sets are sampled from pdfs 
that describe the phase space structure of all or some volume inside the same galactic phase space W. 
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FIG 4. Figure showing the phase space density vector f ^' (left panels) and gravitational mass density vector p % ' (right panels) 
learnt using the synthetic data Di SO ) (lower panels) that is sampled from a known isotropic phase space density and D an i so ) 
(upper panel) that is sampled from a chosen anisotropic phase space density. The true gravitational mass density is presented 
in the solid black lime in the left panels. The true isotropic phase space density that Di SO is sampled from, is shown in the 
lower left panel in the black solid line. The true anisotropic phase space density for the value of the variable L := x A v = 
is depicted in black solid line in the upper left panel. The figure also shows in red, the posterior maximising system parameter 
vectors f < - ! *' and p' , learnt using null-abiding data that are (rejection) sampled from the modal i^ M% > (using p( ') that is 
itself learnt using synthetic data D^. Here i stands for "iso" or "aniso". Modal values of all learnt parameters (components 
of learnt vectors) are represented by open circles while the 95% HPD credible region is shown by the error bar. 
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The log of the joint posterior probability computed using the generated data D^f and D^ ni , are shown in red in the 
right and left panels respectively. Here, the posterior under the null is maximised during chains run with the null-abiding data 
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so , respectively. Here T)i SO has a sample size of 270 and Daniso bears information of 54 particles. 
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Table 3 
Table showing support in data Di for null Hi, i — 1,2, computed using different chains. 



Chain name 


Data set used 


Pr(£Ti|Di) 


PNe - RUN I 


Dx 


0.61 


PNe - RUN II 


Di 


0.58 


PNe - RUN III 


Di 


0.62 


GC - RUN I 


D 2 


0.96 


GC - RUN II 


D 2 


0.96 


GC - RUN III 


D 2 


0.93 



Thus, f\(x, v) ^ /2(x, v) =£- Vi 7^ V2 where /i(x, «) is phase space pd/ defined in volume V\ C W 
and /2(a;, «) is phase space pdf denned in volume V2 C W. In terms of the phase space structure of 
this real galaxy NGC 3379, we can then conclude that the phase space of the system is marked by at 
least two distinct volumes, motions in which do not communicate with each other, leading to distinct 
orbital distributions being set up in these two volumes, which in turn manifests into distinct pdfs for 
these subspaces of the galactic phase space. The PNe and GC samples are drawn from such distinct 
pdfs. 

Of course, such an interpretation would hold up if we can rule out extraneous reasons that might be 
invoked to explain the differential support in Di and D2 to the assumption of phase space isotropy. Such 
extraneous factors are systematically dealt with in Section 8. 

It merits mention that our result that Pr(#2|D2) ~ 1 also suggests that the phase space density that 
the observed GCs in this galaxy live in, is nearly isotropic. 

7.1. The estimate for the whole galaxy. The motivating idea is that if different data sets are sampled 
from different volumes of the galactic phase space, where there is no communication amongst these 
volumes, the gravitational mass density estimates obtained using these data sets will be different. None 
of these individual estimates will however tell us of the gravitational mass density function of the whole 
galaxy, in general. The worrying implication of this is that interpreting one of these estimates of p(x) as 
the galactic estimate, can be completely erroneous. The estimate achieved given one data set will then 
reveal no more about the entire galaxy than the very isolated phase space volume from which these data 
on particle motions are sampled. 

Ideally speaking, the gravitational mass density of the galaxy, at any |x| can be is approximated as 
max{pi(x), p2(x), . . . , p n {x)} where Pi(x) is the estimate obtained using the i-th data set, i = 1, . . . , n, 
where there are n available data sets that have been drawn from n mutually insulated volumes or sub- 
spaces {Vi}f =1 within the galactic phase space, namely from Vi C VV, V2 C W, . . . , V n C W. The 
more the number of such data sets available, i.e. bigger is n, the better is the approximation suggested 
above. However, in this framework we are always running the risk of misidentifying the estimate of a 
property of a subspace of the galaxy as that of the galactic property. 

Here, the z-th data set D,; ~ /i(x, v) where /i(x, v) is the pdf that describes sub-space Vi C W. 
Then the statement that Vi and Vj are "mutually insulated" implies that motions in Vi do not cross over 
into Vj and vice versa, resulting in /j(x, v) ^ /j(x, v), j = 1, 2, , . . . , n. Thus, Di, D2, . . . , bfD n 
are not identically distributed. Hence collating all data sets {Dj}" =1 , as a single input into any method 
that attempts learning the galactic gravitational mass density by invoking phase space densities, will 
not work, if the method demands the data to be iid. (All methods of learning p(x) are underlined by 
the need to invoke the phase space pdf). Still, if we are in possession of knowledge of the correlation 
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Fig 7. Right panel: gravitational mass density vector pS 1 ' (in black) recovered from chain GC — RUN I run 
using data D2 and p^ 1 ' from chain PNe — RUN I run using Di (in red). These gravitational mass density 
results were obtained under the assumption of an isotropic phase space, the support for which in the two data sets 
is indicated in Table 2. Overlaid on these are the identified vectors p' 1 *' (in blue) and p~ - ' (in green), which are 
respectively, the posterior-maximising gravitational mass density vectors identified in chains run with data D^ 9 
and Dg . These data sets are (rejection) sampled from the learnt modal phase space density vectors f ( M1 ) and 
f( M2 ) respectively. The concurrence of p^ 2 ' and p( 2 *> is noted, along with the lack of consistency between p^ 1 ' 
and p' 1 *-*. In the left panel, the phase space density vectors f W (in red) and f ( 2 ' (in black), learnt from the chains 
PNe — RUN I and GC — RUN I, are shown, compared respectively to V l *> (in blue) and V 2 *> (in green). 
Again, the overlap off( 2 ' and f ( 2 *' is noted, as is the discord between f W and f ( l *\ especially at higher energies. 
The f vectors are normalised to unity at e = 1. 



between the pdfs /j(x, v) and /,(x, v), we could use the collated data set in such a method to learn the 
galactic mass density. However, as there is absolutely no measured, simulated or theoretical information 
available on the correlation structure between mutually insulated sub-volumes inside galaxies, such a 
modelling strategy is untenable. Indeed, the development of any such learning strategy, can imply all 
n available data sets jointly, but will be highly sensitive to the non-linear dynamical modelling of the 
phase space of the particular galaxy in question. At this stage we recall that for the majority of galaxies, 
n=l and n is at most 2 for a few galaxies; examples of these systems include NGC 3379 (Bergond et al., 
2006; Douglas et al., 2007), CenA (Woodley & Chakrabarty, in preparation). 

Thus, we can put a lower bound on the total gravitational mass content of the galaxy, as we now 
present for NGC 3379. NGC 3379 is advanced as a dark matter rich galaxy, with the gravitational mass 
inside a radius of about 20 kpc to be at least as high as about 4 to lOx 10 11 M Q , where M denotes the 
mass of the Sun and the astronomical unit of length, kiloparsec, is abbreviated as kpc. 

8. Discussions. In the above test, a high support in the GC sample towards an isotropic phase space 
pdf, along with a moderate support in the sampled PNe for the same assumption, indicate that the two 
samples are drawn from two distinct phase space density functions. 

The expectation that the implementation of the PNe and GC data sets will lead to concurring gravi- 
tational mass density estimates is foreshadowed by the assumption that both data sets are sampled from 
the same - namely, the galactic - phase space density /(x, v). The apparent motivation behind this as- 
sumption is that since both samples live in the galactic phase space W, they are expected to be sampled 
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from the same galactic phase space density, at the galactic gravitational potential. However, such does 
not necessarily follow if - for example - /(x, v) is a non-analytic function: 

(8.1) /(x,v) = / p (x,v), V(x T ,v T ) T GV p CW, p = l,...,p max 

Then, if the GC data D2 are sampled from the density /^(-, •) and PNe data Di ~ f v (-, •), where 
77, C G {1, • • • ,Pmax}, V 7^ C> then the statement that both the observed samples are drawn from equal 
phase space densities is erroneous. If this assumption is erroneous, i.e. if observed data Di and D2 are 
sampled from unequal phase space pdfs, 

- firstly it implies that the phase portrait of the galaxy NGC 3379 is split into at least two sub-spaces 
such that the observed GCs live in a sub-space V( C VV and the observed PNe live in a distinct 
sub-space V v C W, 77 7^ £. 

- secondly that the phase space densities that describe V v and V^ are unequal. 

Qualitatively we understand that if the galactic phase space W is split into isolated volumes, such that 
the motions in these volumes do not mix and are therefore distinctly distributed in general, the phase 
space densities of these volumes would be unequal. This is synonymous to saying that W is marked by 
at least two distinct basins of attraction and the two observed samples reside in such distinct basins. 

Thus, a split W will readily explain lack of consistency in the estimate of support in the two data sets 
for the null that the phase space density function that these two data sets are drawn from are isotropic. 
However, the fundamental question is really about the inverse of this statement. Does differential support 
in Di and D2 to isotropic pase space pdfs necessarily imply a split W? Indeed it does, since differential 
support in Di and D2 towards isotropy of the pdf that describes the respective native phase space 
Vi C VV and V2 C VV, implies that the distribution of the phase space vector (x T , v T ) T in the sub- 
spaces Vi and V2 are distinct. This in turn implies that the phase portrait of this galaxy manifests at 
least two volumes, the motions in which are isolated from each other. Such separation of the motions 
is possible if these distinct sub-spaces that the two datasets reside in, are separated by separatrices 
(Thompson and Stewart, 2001) into isolated volumes, motions across which do not mix. 

As for a physical reason for the phase space distribution of the PNe to be less isotropic than the 
GC population, we can only speculate at the level of this paper. The GC population being an older 
component of a galaxy (than the PNe population), might have had longer time to equilibrate towards 
an isotropic distribution. Also, PNe being end states of stars, the PNe population is likely to reside in a 
flatter component inside the galaxy, than the GC population. 

8.1. What could cause a split galactic phase space. Galactic phase spaces can be split given that 
a galaxy is expectedly a complex system, built of multiple components with independent evolutionary 
histories and distinct dynamical timescales. As an example, at least in the neighbourhood of the Sun, 
the phase space structure of the Milky Way is highly multi-modal and the ensuing dynamics is highly 
non-linear, marked by significant chaoticity. The standard causes for the splitting of VV include the 
development of basins of attraction leading to attractors, generated in amultistable galactic gravitational 
potential. Basins of attraction could also be triggered around chaotic attractors, which in turn could be 
due to resonance interaction with external perturbers or due to merging events in the evolutionary history 
of the galaxy. 

8.2. Assuming spherical geometry. One worry that astronomers have expressed in the literature 
about the galaxy NGC 3379 is that the spatial geometry of the system is triaxial and not spherical. For 
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us, the relevant question to ask is if there is support in the data for the consideration of the gravitational 
mass density to depend on the spherical radius r alone. In our work, the fact that the methodology 
assumes the gravitational mass density to be dependent on r 2 :=x • x(= x\ + x\ + x|) is a manifestation 
of the broader assumption of phase space isotropy (see Section 3.1). Thus, our test for phase space 
isotropy includes testing for the assumption that the gravitational mass density of the galaxy NGC 3379 
bears a dependence on the spatial coordinates Xj, only via r,j = 1,2, 3. In other words, we have already 
tested for the assumption that the gravitational mass density depends on the components of the spatial 
vector, via the spherical radius r. 

8.3. True Mass Distribution. In contrast to the PNe and GC samples observed in this galaxy, if two 
observed sets of galactic particles can be inferred to have been drawn from the same phase space density, 
we will expect consistency in the gravitational matter density that is recovered by using such data sets 
in a mass determination formalism. At the end of the discussion presented above, we will naturally want 
to know what the true gravitational mass density function of NGC 3379. However, if distinct density 
estimates are available at a given radius, from n independent data sets as {pi(r),p2(r), . . . , p n {r)}, 
then the lower limit on the galactic gravitational mass density at r is sup{pi(r), p2(f), • • • , p n {r)}- 

8.4. Risk of using particle kinematics. The above results and arguments suggest that it is inher- 
ently risky to refer to the gravitational mass density recovered using an observed particle sample - and 
the gravitational potential computed therefrom - as the gravitational potential of the galaxy. We have 
demonstrated this with the example of NGC 3379 and shown that inconsistencies in p(r) learnt using 
distinct observed samples, cannot be attributed to any other factor except that these observed samples 
are drawn from distinct and insular sub-spaces within the galactic phase space, and/or the lack of time- 
independence in the gravitational mass density or phase space density function. 

SUPPLEMENTARY MATERIAL 

Supplement A: Motion traces gravitational field - the Earth-Sun system 

(). That motion tracks gravitational field is not entirely alien to our experience - after all, the revolution of 
the Earth around the Sun, at a speed of V® and radius R®, allows for an estimate of the solar gravitational 
mass Mq via Newton's law, as V^ = GM & /R^, where G is a known constant and the assumption of a 
circular orbit is made. In contrast to this, when the topology of the orbits in phase space is unknown, a 
closed-form relation between the orbital speed and gravitational mass enclosed within the orbit cannot 
be written. Then, in the absence of information about the topology of the orbits of galactic particles, 
a parametric approximation for the probability density function of W maybe too naive a model if the 
system manifests even moderate complexity. 

Supplement B: Comparison with the conventional point of view 

(). To sum up the results obtained above, we state that in general, kinematic data drawn from distinct 
phase space densities will yield distinct gravitational mass density functions and this is not solely be- 
cause of the lack of sufficient information to help constrain the state of anisotropy in the phase space 
of the system. Here we illustrate the possibility that even when phase space anisotropy parameters - as 
defined in astronomical literature - are equal given unequal phase space pdfs, the learnt gravitational 
mass estimates will be in general be unequal. On the very outset this is not surprising at all - after 
all, equal anisotropy parameters suggest equal ratios of the second order moments of the phase space 
pdf, which is of course not suggestive of equal phase space pdfs in general. The illustration is per- 
formed using the method that is conventionally used to learn gravitational mass enclosed within radius 
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r, M{r) = Jq p(s)A-Ks 2 ds, is the Jeans equation that implements the (typically smoothed) empirical 
dispersion 03 of the values of V3 of the galactic particles in an observed sample, and the number density 
function of the sampled particles, in the assumed spherical spatial geometry, i.e. the number of particles 
v{r) that lie in the interval (r, r + 5r] where 5r is the width of the radial bin chosen by the astronomer. 
Jeans equation gives 



(8.2) 



M(r) 



G 



dlnu(r) din a 2 



dlnr 



+ 



dlnr 



(3(r) 



where the anisotropy parameter is defined as /3(r) = 1 — a^/fJ 3 . (Since /?(■) is a function of r, we 
refer to it as the anisotropy function rather than the anisotropy parameter from now). Then we see that 
it is possible to obtain distinct M(r) using two data sets that are drawn from two distinct phase space 
distributions that are equally anisotropic (as parameterised by the anisotropy parameter /3(r)) - then 
dlna^{r)/dlnr and dlnv{r)/dlnr terms would in general be different in the two cases, even if /3(r) 
is the same. This follows from the fact that the second moments (leading to the dispersion 03 (r)) and 
the zeroth moments (leading to u{r)) of unequal phase space density functions are unequal in general, 
even if the anisotropy functions are equal. Thus, for example, if two observed samples are drawn from 
phase space densities, each of which is isotropic, but are unequal functions, the anisotropy functions 
will be identically zero in each case, but the second and zeroth order moments would not necessarily 
enjoy equal rate of change with lnr, i.e. the M(r) learnt using the two samples will be inconsistent 
even in this case. Thus, unequal gravitational mass density estimates necessarily imply distinct phase 
space density functions that the observed data are sampled from, but do not necessarily imply unequal 
anisotropy functions. 
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